home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Cream of the Crop 26
/
Cream of the Crop 26.iso
/
program
/
ddj0897.zip
/
RCSC.ZIP
/
PACKAGES
/
CAS.ZIP
/
CAS.DOC
next >
Wrap
Text File
|
1996-10-16
|
25KB
|
620 lines
CAS -- The 8051 C-Assembler
(0) Introduction
(a) Features
This is a free full-featured one-pass 8051 assembler, it could very well be
the first one-pass assembler for the popular MCS-51 family of microprocessors.
What you get are the following features:
* Seperately assembleable files. There are two stages of assembly:
- Pass 1: Creation of object files
- Pass 1 1/3: Linking of object files
* Segmentation
- RELATIVE ADDRESSING supported for all segment types
* Conditional assembly, with a C-like syntax. Example:
if (Condition) {
Assembly instructions...
} else {
Assembly instructions...
}
* Multiple statements per line with C-like syntax.
* C-like expression syntax.
* Command-line options similar to those of *NIX C compilers.
* An extensive archive of real-life assembly language programs,
including a multi-tasking library and an 8051 disassembler.
Plus, if you don't want to learn all the elaborate ins and outs of this
tool right away, it is just as easy to use the first time out as any minimal
assembler.
You simply will not find anything this extensive anywhere in the public
domain. But it's yours, here, for free.
Also under works: a compatible 8051 simulator kit for software developers.
What makes this kit unique is that you can (and usually must) link in your own
C code to define any arbitrary 8051 environment at all. This gives you the
flexibility to simulate the 8051 in your favorite embedded application and to
even simulate the I/O on a desktop. A Standard Environment file is included
with the package.
(b) Design Philosophy ... everything is done in one pass.
A clean distinction is made between the two phases of assembly: (a) creating
segments and formatting image files, (b) mapping segments and resolving
references to variable addresses.
An assembly language program will normally consist of a set of assembly
language modules (or source files). Each will typically be named with the
suffix ".s". In addition, there will also be a set of files, with names ending
in ".h" whose purpose is to provide common points of reference for declarations
of objects in or related to modules. They are incorporated in *.s files using
the "include" directive.
The first stage of assembly will create OBJECT files, whose names end in
".o": one for each assembly language module. For instance, a module named
Kernel.s will be assembled to the object file Kernel.o.
The second stage will take all the object files that have been created
and LINK them together. This process will consist mainly of completing the
definitions of variables defined in one module and used in another, and in
mapping the memory segments defined in each module onto a memory image.
These two stages correspond roughly to the first and second pass of a
traditional two-pass assembler. But there are now two major differences:
(a) the second stage can now be deferred. It is possible to assemble object
files only, and defer the linking phase. Furthermore, it is possible
to use the SAME object file in more than one project.
(b) the second stage is now considerably shortened compared to the second
pass of a traditional two-pass assembler because object files tend to
be much smaller than source files and because the assembler no longer
has to process the assembly language itself by the second stage.
(1) Command line arguments
The cas assembler's command line basically follows that of a typical C
compiler. In the examples:
(a) cas -c kernel.s
(b) cas -c math.s data.s stdio.s kernel.s
(c) cas math.s data.s stdio.s kernel.s
(d) cas -o data.hex math.o data.s stdio.o kernel.s
(a) will assemble the file kernel.s, creating kernel.o.
(b) will assemble all the files listed, creating .o files in the process.
If a .o file is listed with the -c option, it is ignored.
(c) will assemble all the files listed, as in (b), and then link all the
corresponding .o files. The output file will take the same base name
as the first file listed, and will have the suffix .hex. Therefore, the
output in this example will be math.hex.
(d) will do the same as (c), but will name the output file data.hex.
If a .o file is listed in either of these two command lines it will be
ignored during assembly, but will be used during linking.
(2) Directives
The following is a summary of the directives available in this language.
(a) FILE INCLUSION -- include "FILE"
This command will read the contents of the file named (FILE) into the
current location of the current file. By convention, include files should
have names ending in ".h" or ".i" and should only consist of declarations.
Include files generally serve two purposes: to provide a place to store
related constant definitions and declarations, to declare the globally visible
objects of an assembly language module.
(b) Setting current SEGMENT and LOCATION -- seg, at, org
At any point in scanning a *.s assembly language file, the assembler will
recognize a current segment and current location. The latter can be referred
to by the user as $.
To see how these items can be set, look at the following examples:
seg code
seg xdata at 0x8000
seg xdata org 0x8000
org 50
at 50
The first example sets the current segment to the type "code". The current
location is left unspecified. THIS IS HOW RELATIVE ADDRESSING IS INITIATED.
The actual address of the segment's start will not be determined when the
object file is created, but is deferrred until the object file is linked.
Why do things this way? One simple reason: MODULARITY. You can now
define your own assembly language module, and convert it into an object
file ready to be linked in with the rest of whatever program might be
using it. You don't have to worry about the exact address where you
memory segments will be located each time you include this module in a new
program. This makes it possible to create reuseable libraries of common
assembly language functions.
The second and third example do exactly the same things because "at" and
"org" are synonymous. The latter is included only for compatibility with
other assembly language programs and for familiarity's sake, but I strongly
recommend you using the former. It simply reads nicer.
The effect of this operation is to set the current segment to "xdata" and
the current location to 0x8000.
The last two examples are equivalent to one another and set the current
location to 50 without changing the current segment.
At the very start of assembly, the current segment is set to the first
segment ("code"), and the address is left indefinite. When different modules
are linked together, the linker will attempt to take all the segments of each
type and place them in non-overlapping areas of memory, shifting the relative
segments around as needed to accomplish this goal.
What if you want to control the placement of objects, say to exclude
addresses 0 to 4000 hex? An easy way is to simply write up a module to the
effect:
seg code at 0
ds 4000h
assemble it seperately and link it in with any program where you want to
reserve this address space.
The linker tries to place your segments in exclusive areas in as tight a
fit as possible. So this module will result in the address space 0 to 4000
being excluded from the rest of your program.
The segments types supported by this 8051 assembler are the following:
* code --- the 8051 code address space, ranges from 0 to ffff hex.
* xdata -- the external data address space, same range.
* data --- the internal data/register space. Ranges from 0 to ff.
Only addresses under 80 hex can be used in mnemonics
involving direct addressing.
Other segment types are internally used by the assembler. They are:
* sfr ---- the Special Function Register space -- ranges from 80 to ff.
* bit ---- the bit addressible address space. These comprise the
individual bits in registers 20(hex) to 2f(hex), and the
sfr addresses (hexadecimal) 80, 88, 90, 98, ..., f0, f8.
Defining a new segment with one of these types will result in an error.
(c) Defining new LABELS -- LABEL equ Exp, LABEL Type Exp, LABEL:
LABEL set Exp, LABEL = Exp
These operations are defined as follows:
LABEL equ Exp
defines a constant value LABEL and sets it to the value Exp.
LABEL Type Exp
defines a constant address "LABEL" of the indicated type and
sets it to the address given by "Exp". The types recognized
by this assembler are: code, xdata, data, sfr, and bit.
LABEL:
sets a constant address "LABEL" to the current address in the
current segment.
LABEL set Exp
defines a variable, LABEL, and sets it to the value Exp.
LABEL = Exp
the same thing as "set".
The following assembly language fragment is an illustration of these
operations:
seg code at 0
Start: ds 0x4000
Size equ $ - Start
End code Start + Size
The first statement sets the current segment and location to "code" and 0.
The next statement is preceded by the label, "Start:". This is equivalent
to the statement:
Start code $.
What it does is define "Start" as a code address, and sets it to the current
location (which is 0). Following this is an instruction to reserve 4000(hex)
units (bytes) of storage. After this operation, the current location is now
0x4000.
The third instruction sets the numerical constant "Size" to 0x4000 - 0, or
just 0x4000. The final directive defines a code address with the name "End"
and sets it to the address Start + Size (or just 0x4000).
Variable differ from constants in that they can be redefined. Constants
cannot be redefined.
(d) Numeric labels
One can also define anonymous numeric labels, as in the following example:
1: cjne A, #0, 1f
inc A
movx @DPTR, A
inc DPTR
mov A, @R1
inc R1
jz 2f
sjmp 1b
1: setb C
ret
2: clr C
ret
Each occurrence of "1:" stands for a unique anonymous label, likewise for
"2:". Any number may be used in this way to denote an anonymous label.
When a label is referenced by the number followed by an "f", then the
first matching numeric label IN THE CURRENT SEGMENT forward of the current
location is being referred to. In the example above, 1f and 2f refer
respectively to the occurrences of 1: and 2: toward the end of the example.
When a label is referenced by the number followed by a "b", then the
first matching numeric label IN THE CURRENT SEGMMENT behind the current
location is being referred to. In the example above, 1b refers to the
1: at the top of the example.
Thus, this segment is equivalent to the following:
X1: cjne A, #0, Y1
inc A
movx @DPTR, A
inc DPTR
mov A, @R1
inc R1
jz Y2
sjmp X1
Y1: setb C
ret
Y2: clr C
ret
This feature saves you from the burden of defining needless names for
labels that really serve as nothing more than place-holders.
(e) Declaring GLOBAL labels -- global, public
Any constant directive:
LABEL equ Exp
LABEL Type Exp
LABEL:
can be prefixed by "global" or "public" to result in:
global LABEL equ Exp
global LABEL Type Exp
global LABEL:
or
public LABEL equ Exp
public LABEL Type Exp
public LABEL:
What this does is to make these labels visible to modules other than the one
where these labels are defined. By default, all labels are visible only in
the file where they are used.
(f) Declaring EXTERNAL labels -- extern Type LABEL, ..., LABEL
extern equ LABEL, ..., LABEL
For each global label defined in a *.s module file, a corresponding
external declaration should be made be made in whatever other module this
label is to be used. Typically, one will make these and other related
declarations in a *.h file and then INCLUDE this file in whatever module needs
the declarations. The type must match the type of the label being referenced,
if it is an address, or it must be "equ" if the label referenced was a numeric
constant.
For example if one declared global labels in a module Kernel.s as follows:
public STACK_BASE data 0x80
...
seg code
public Spawn:
....
public Resume:
...
one would generally make the corresponding declarations:
extern data STACK_BASE
extern code Spawn, Resume
in a header file (say, Kernel.h), and then include this file in any source
module where the addresses STACK_BASE and Spawn might be needed.
(g) Memory ALLOCATION -- ds, rb, rw
The following operations can be used in any segment. They are generally
used to allocate space for objects and so are generally used in conjunction
with "LABEL:" type definitions. These are examples:
seg code at 0
BASIC_SEG: ds 0x4000
seg xdata
Byte: ds 1
ByteArray: rb 5
WordArray: rw 5
The first example reserves 0x4000 units (bytes) in the current segment for
the variable BASIC_SEG and then increments the current location by 0x4000.
Basically, this operation behaves as if the assignment "$ = $ + 0x4000" had
just been carried out.
Both "ds" and "rb" are exactly equivalent, but the latter more descriptively
states: reserve single-byte units. So the second example reserves 1 byte for
the variable "Byte", and 5 bytes for "ByteArray".
NO MEMORY IMAGE IS GENERATED FOR ANY SPACE SKIPPED BY ds/rb/rw.
The third example is equivalent to:
WordArray: rb 10
Each unit following a "rw" is a word, which consists of two bytes.
(h) Memory FORMATTING - db, dw
These operations can be used in the code segment only. They are the only
directives that can generate memory images. The only other operations that
generate memory image output are the 8051 mnemonics, which likewise are
restricted to the code segment only.
Two purpose served by these operations is mainly to initialize data,
examples:
ByteArray: db 'a', 'b', 'c', 'd', 'e'
String: db "This is a string", 0
In the following examples:
db 0x20, "String", 'c'
dw 0x1234, 0x5678
the first operation lays out the byte 0x20 and equivalent character codes
for 'S', 't', 'r', 'i', 'n', 'g', and 'c' in that order. The current
location is then increment by 8 to the location following the last item.
The second operation is equivalent to the following:
db 0x12, 0x34, 0x56, 0x78
It formats 2-byte word units into memory.
Both of the operations: db, and dw can be followed by a comma-seperated series
of numeric values or addresses. In addition, db can accept strings, as shown
in the examplex above.
(i) CONDITIONAL assembly -- if (Ex) ST, if (Ex) ST else ST
These statements are used to selectively assemble different sets of
statements. For example
if (STAND_ALONE) {
at 0x03
mov R0, #SP_IE0
acall Pause
reti
} else {
at 0x4003
pop PSW
mov R0, #SP_IE0
acall Pause
reti
}
will assemble the first set of statements (at 0x03 ... reti) if the label
STAND_ALONE is anything other than 0, and the second set (at 0x4003...reti) if
the label is 0.
An example with the exact same effect could be written as:
if (STAND_ALONE) SEG equ 0; else SEG equ 0x4000
at SEG + 3
if (!STAND_ALONE) pop PSW
mov R0, #SP_IE0
acall Pause
reti
Both the if and else part of the conditional will accept only one statement.
If more than one statement needs to be included, as in the first example, then
they can be grouped within curly braces.
(j) Statement GROUPING -- { ... }, multiple statements on a line.
Any sequence of statement included within a matching set of curly brackets
is treated as a single statement. It can then be used in the body of any
conditional just like any single statement can.
SPECIAL NOTES ON STATEMENT FORMATTING:
(A) ALL STATEMENTS (a) THROUGH (h) MUST END IN SEMICOLONS.
However, this semicolon can be elided if it is the last item on a line. This
allows compatibility with more traditional one-statement-on-a-line type
assemblers. So normally, you don't have to even concern yourself with this
if you adhere to one-statement per line style.
(B) A BASIC STATEMENT ((a) THROUGH (h)) MUST BE WRITTEN ALL ON ONE LINE
It cannot be split up into two or more lines.
(C) ALL COMMENTS ARE IN C++ STYLE.
Many assemblers use the semicolon to initiate comments. I have decided
against this feature in favor of making this assembler more compatible with C++
syntax. Comments occur in the following two forms:
(a) Anything included between a matching pair /* ... */
(b) Anything included between a // and end of line.
However, for increased compatibility, I also allow the following format:
(c) Anything included between a ;; and end of line.
My personal style is to precede comments with a ;;;, so none of this impinges
on the software included in the archive with the assembler.
There is a short C-program included that will blindly convert all single
semicolons to double semicolons. Since I've observed that semicolons rarely
occur inside string or character constants in actual 8051 programs, this should
ALMOST always be sufficient to resolve any incompatibilities with your older
assembly language programs.
(n) What goes in a *.s file, what goes in a *.h file?
Generally speaking, declarations should be placed in a *.h header file.
The design of this assembler (especially with it being a one-pass assembler) is
intended to support this usage. Any of the following is a declaration:
(c) Defining new LABELS -- LABEL equ Exp, LABEL Type Exp
(f) Declaring EXTERNAL labels -- extern Type LABEL, ..., LABEL
extern equ LABEL, ..., LABEL
Declarations only meant to be accessed within one module should be made inside
that module, instead of out in a header file.
The following should be used only in *.s files, as they are generally
(a) used to create memory images, (b) used to define non-global objects, or
(c) used to define address values:
(a) FILE INCLUSION -- include FILE
(b) Setting current SEGMENT and LOCATION -- seg, at, org
(c) Defining new LABELS -- LABEL:
(d) Numeric labels
(e) Declaring GLOBAL labels -- global
(g) Memory ALLOCATION -- ds, rb, rw
(h) Memory FORMATTING - db, dw
The last two items are generally used in many different contexts, and so can be
used anywhere:
(i) CONDITIONAL assembly -- if (Ex) ST, if (Ex) ST else ST
(j) Statement GROUPING -- { ... }
(3) Expressions
(a) Operators
The syntax is the same as in C. The following operations are defined:
BIT-WISE: ~, &, ^, |, <<, >>
BOOLEAN: !, &&, ||, <, <=, >, >=, ==, !=
CONDITIONAL: ? :
ARITHMETIC: prefix + and -, +, -, *, /, %
CONVERSIONS: high, low, by
BIT CONVERSION: .
The operator precedences are all the same as in C.
The latter two groups, not defined in C, are described in more detail
below. The operator high, and low have the same precedence as all the other
prefix operators (+, -, !, and ~). The operators "by" and "." have the lowest
precedence of all infix operators, so for example
A * B by C
is resolved as:
A * (B by C)
and
A.B + C
as:
(A.B) + C
Parentheses may be used to enclose expressions as in C, for example:
A + ((B << 2)&(C >> 3))
(b) CONVERSIONS ... high X, low X, H by L
The following examples illustrate these operations:
high 1234h (result: 12h .. the upper byte of the word 1234h)
low 1223h (result: 34h .. the lower byte of the word 1234h)
12h by 34h (result: 1234h)
(c) BIT-CONVERSION ... Dir.Pos
This is an 8051-specific operation related to the bit-addressing structure of
the processor. The first argument represents a direct data register (of type
"data" and value < 80h, or type "sfr" and value >= 80h). The second represents
a bit position (0, through 7).
The register, Dir, must be bit addressible. These include only:
data; 20h - 2fh
sfr: 80h, 88h, 90h, 98h, 0a0h, 0a8h, 0b0h, 0b8h,
0c0h, 0c8h, 0d0h, 0d8h, 0e0h, 0e8h, 0f0h, 0f8h
The sfr registers and bit positions generally have meanings defined by the
manufacturer of the 8051 processor and vary between different versions of the
8051. They are not generally free to be defined by the programmer for
arbitrary use. Most of them control or monitor the internal 8051 peripherals.
(d) LOCATION COUNTER -- $
A variable address that denotes the current location within the current
segment. NOTE:
dw $, $ - 2, $ - 4
IS EQUIVALENT TO:
dw $; dw $ - 2; dw $ - 4
which is equivalent to:
1: dw 1b; dw 1b; dw 1b
The location counter advances in the middle of a dw or db.
(e) NUMERIC CONSTANT
This assembler accepts both C numeric syntax, as well as the Intel
numeric syntax. The relation between the (extended) C notation and
Intel notation is illustrated below:
HEXADECIMAL: 0xa44f = 0a44fh
0x23 = 23h
DECIMAL: 23 = 23
23 = 23d
OCTAL: 034 = 34q
056 = 56o
BINARY: 0b1001 = 1001b
Upper case may be used anywhere lower case is used, so the above can be written
as:
HEXADECIMAL: 0XA44F = 0A44FH
0X23 = 23H
DECIMAL: 23 = 23
23 = 23D
OCTAL: 034 = 34Q
056 = 56O
BINARY: 0B1001 = 1001B
(f) LABELS
Labels may consist of any sequence of letters, the _, and digits, not
starting in a digit. As with numbers, labels are CASE INSENSITIVE. So
all of the following are equivalent:
PPC, PPc, Ppc, pPC
(4) Referencing Expressions
At any time during assembly, a label may be in one of 3 states:
(a) DEFINED and ABSOLUTE:
This is either a numeric label, or a label denoting an address
whose actual value is known.
(b) DEFINED and RELATIVE:
This is a label denoting an address whose location within its
segment is known, bot with the segment being relative.
(c) UNDEFINED:
This is a label that is either defined elsewhere in another file,
or defined later on in the file currently undergoing processing.
The following restrictions hold when using expressions:
* Only ABSOLUTE labels can be used in any of the directives:
at/org,
ds/rb, rw
if (...)
* Only DEFINED labels can be used on the right-hand side of any of the
follwing directives:
Label equ Exp,
LABEL Type Exp
LABEL set Exp, LABEL = Exp
* Any expression can be used with any image generating statement:
Mnemonics
db, dw
If the expression's value is not known at the time of assembly, then the
corresponding location in the image is zeroed out. If the expression's value
becomes known by the time the file is processed, the assembler will go back
and fill in the zero with the appropriate value(s).
(5) Bugs (or "features")
(a) There is no way to tell the assembler to locate relatively addressed
data registers in the directly addressible space. Consequently you
may receive numerous errors during the linking phase telling you that
such and such registers cannot be directly addressed.
There are basically 2 ways to resolve this: (1) give the registers
absolute addresses, (2) try listing the files in which these registers
are defined first. The linker maps relative segments from the files
in the order you list those files.
In the makefile of the sample program provided (in 8051/assem/data),
the linking phase is done with the command line:
cas -o math.o data.o stdio.o kernel.o
This ordering resolves the problem.
(b) The assembler won't recognize UNIX-style newlines on a DOS. Therefore,
a conversion utility (nl.c) has been provided.
(c) No run-time checks are made against the object files processed. A
corrupt object file will crash the assembler during the linking phase.